Skip to content

Conversation

@man-shu
Copy link
Collaborator

@man-shu man-shu commented Sep 15, 2025

Relates to #306. With @AngelReyero.

For section 1 of the user guide, which contains the definition of all basic concepts.

@man-shu man-shu marked this pull request as draft September 15, 2025 10:53
@codecov
Copy link

codecov bot commented Sep 15, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.08%. Comparing base (cb984f8) to head (e0bb238).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #408   +/-   ##
=======================================
  Coverage   98.08%   98.08%           
=======================================
  Files          22       22           
  Lines        1148     1148           
=======================================
  Hits         1126     1126           
  Misses         22       22           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@man-shu man-shu changed the title Section 1 of user guide/definition of concepts [DOC] Section 1 of user guide/definition of concepts Sep 15, 2025
Copy link
Collaborator

@bthirion bthirion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be useful to have in this section a typology of all VI methods.

Copy link
Collaborator Author

@man-shu man-shu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall.

Just wondering whether we should introduce the Total Sobol Index in the "Types of VI methods" section or some other place. The original issue #306 mentions it...

There are two main types of VI methods implemented in HiDimStat:

1. Marginal methods: these methods provide importance to all the features
that are related to the output, even if it is caused by spurius correlation. They
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
that are related to the output, even if it is caused by spurius correlation. They
that are related to the output, even if it is caused by spurious correlation. They

1. Marginal methods: these methods provide importance to all the features
that are related to the output, even if it is caused by spurius correlation. They
are related with testing if :math:`X^j\perp\!\!\!\!\perp Y`.
Example of such methods is LOCI.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to provide a reference for LOCI, or at least expand the abbreviation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would also suggest the reference but I think they are not yet available.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For LOCI, I find this reference: Ewald, Fiona Katharina, Ludwig Bothmann, Marvin N. Wright, Bernd Bischl, Giuseppe Casalicchio, and Gunnar König. "A guide to feature importance methods for scientific inference." In World Conference on Explainable Artificial Intelligence, pp. 440-464. Cham: Springer Nature Switzerland, 2024.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant it was the reference to the implemented class, not a bibliography reference.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the biblio ref should be good enough for now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference for the implementation should be only in the docstring of the class. In this case, we can keep a more general bibliography.

1. Marginal methods: these methods provide importance to all the features
that are related to the output, even if it is caused by spurius correlation. They
are related with testing if :math:`X^j\perp\!\!\!\!\perp Y`.
Example of such methods is LOCI.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Example of such methods is LOCI.
An example of such a method is LOCI.

Comment on lines +73 to +76
i.e., they contribute unique knowledge. They are related with Conditional
Independence Testing, which consist in testing if
:math:`X^j\perp\!\!\!\!\perp Y\mid X^{-j}`. Examples of such methods are
:class:`hidimstat.LOCO` and :class:`hidimstat.CFI`.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
i.e., they contribute unique knowledge. They are related with Conditional
Independence Testing, which consist in testing if
:math:`X^j\perp\!\!\!\!\perp Y\mid X^{-j}`. Examples of such methods are
:class:`hidimstat.LOCO` and :class:`hidimstat.CFI`.
i.e., they contribute unique knowledge. They are related to Conditional
Independence Testing, which consists of testing whether
:math:`X^j\perp\!\!\!\!\perp Y\mid X^{-j}`. Examples of such methods are
:class:`hidimstat.LOCO` and :class:`hidimstat.CFI`.

soon).

Variable Selection
-------------------------------
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-------------------------------
------------------



High-dimension and correlation
-----------------------------------
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-----------------------------------
------------------------------

Comment on lines +67 to +68
that are related to the output, even if it is caused by spurius correlation. They
are related with testing if :math:`X^j\perp\!\!\!\!\perp Y`.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
that are related to the output, even if it is caused by spurius correlation. They
are related with testing if :math:`X^j\perp\!\!\!\!\perp Y`.
that are related to the output, even if it is caused by spurius correlation. They
consist of testing whether :math:`X^j\perp\!\!\!\!\perp Y`.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that sounds better?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because they do not directly test whether X is independent of Y because they are variable importance measures, not just for selection. That is why I would say that implicitly they are related to this testing, but they do not consist on this testing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok makes sense!

statistical control to the discoveries made. Simply selecting the most important
features without such control is not valid. Different forms of guarantees can
be employed, such as controlling the type-I error or the False Discovery Rate.
This step is directly related to the task of Variable Selection.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be very wrong, but isn't this section somewhat redundant to the Variable Selection section? Could it be incorporated with the Variable Selection section?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I am not sure how. Indeed it is important to make explicit that the power of the library is to provide statistical guarantees too.

It allow us to rank the variables from more to less important.

Here, ``VI`` can be a variable importance method implemented in HiDimStat,
such as :class:`hidimstat.LOCO` (other methods will support the same API
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can use the full name of the model before to introduce the acronym of it, it will be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants